DOCOHEIT SESOHE 



ED 193 939 



St Oil 904 



JOT HOE 
TITLE 
POB DATE 
HOTE 



Benzon^ Hilliaio 

Computational Linguistics and Discourse Analysis, 
79 

13p.; Paper presented at the Annual Meeting o£ the 
North-East Modern Language Association (Bartford* CT^ 
March 29-31, 1979». 



ECBS PRICE MF01/FC01 PlUS Postage- 

DESCSIPTOBS ^Computational Linguistics; ^Discourse Ajtalysis; 

♦Linguistic Theory; Models* Semantics 



ABSISACT 

The profound use of the computer in discourse 
analysis must employ a theory o£ discourse comprebensicn and 
production with which to conduct the aralysis. Models currently 
employed in ccnputational linguistics have a semantic basis and are 
goal<^directed. The basic model is an associative cognitive network* 
The basic inventory o£ concepts o£ the system is given in the 
systemic network, which is organized into paradigmatic^ syntagmatic, 
and compcnential structures. Since everts happen in particular places 
at particular times, there is also an episodic structure* The 
gnomonic system defines abstract concepts over episodes. According to 
Phillips (1975)^ discourse coherence must be considered on two 
levels, the episodic and the gnomic. A discourse which engenders 
episodic and/or gnomonic expectations which are not then fulfilled is 
incoherent. A lower limit on coherence may be defined as a discourse 
so ill-^formed t*iat it makes nc 3ense even to its creator. The upper 
limit cn coherence is set by the most powerful creative minds* 
Between the two limits^ discourse analysis, from the point of view of 
the computational linguist^ probably requires nothing less than a 
full'blown computational theory of the human mind# (JB) 



* Beprodtictions supplied by BDBS are the best that can be made * 

* from the original documer.t, * 

34( )| )| 34( )| 34( 34( 34( 34( 34( 34( j4t 34(34( 34( 34( 34(9|t 34( JlC 34( 34( 34(34( ifC 34( 34( 34( 34( 34( 34^ 



a- 
a- 



flCKHITATICMAL LBJGUISMCS ASD DISCOBHSB ANALYSIS 
Vllllaft L. Benson 



EDUCATION 

THit DOCUMENT HAt ftECN ftEPRO^ 
0UC£O E?(ACtCY AS RECEIVED FftOf^ 
tHE PERSON OR ORGANlZATlONDfllGlN- 
j^TINGIT POlNtSDF ViEWOROPlNiDNS 
&TA1ED DO HOT HECESiAftlCY REPRE- 
SENT OFFKIAC NAt***^***. INSTITUTE OF 
EDUCATION POSlTlO,p OR POCICY 



lansuage^ Literature and CoanunloatlGns 
Rensselaer Polytechnic Institute 
Troy, Hew Xoik 12181 



- pERMLSStON TO a£PftO0UC£ THIS 
MATERIAL MAS BE£N GRANTED BY 



TO THG eoUCATJONAL nESOORCES 
INFORMATWN CENTER (ERia" 



cr 
5 



Irf»sely spe&klng coopatatlodctal linguistics Is sanf use of the cmpliter 
in connection with the study of natuial languages. But there Is a superficial 
use of the conputer and a jprofound use. The superficial use deals only tdth 
the surface of language and produces cotvcordances, mxd countSt nsA statistical 
anslyses. The profound use treats the computer as a device for slmul^itlng ^ 

theories about the production and compr^enslon of discourse. This paper 

> 

is concerned only with the profcund use ■ (for a review of the field see 
Benson and Hays 1976). 

I have only one point to maket The profound analysis of discourse must 
CTploy a theory of discourse cwiprehension and production with ^rtilch to con- 
duct th3 analysis. There is no inductive analytical procedure which one 
can apply to texts and scnehow naglcally **come up with** a theory- of discourse. 
Rather, one must first **ccoie up with" a ttieory of discourse* no matter how 
exude the theory might be (and our present theories are very exude) , and 
then see how well It perfoms with a body of texts. To «valuatei;the 
model's performance me can alAulate the model^ on a computer and one can 
use It as a basis for psydioUngulstlc experimentation (Moimaji et aX 1975> 
Klntsch 19?^» Thomdyke 1977) • One then creates a better model and It 
mustt In tuxn» be evaluated* 



The nodels currently employed In computational linguistics have two 
chancterlstlos Khldi are lJiaK>rt«ntt 

1) They start with a semantic basis. Discourse Is produced by operation 

a semantic base and It Is compsdiraded by asslAllatlon into that 
semantic base* 

2) Tine Is intrinsic to the model* Discourse production and cm- 
prehension and ptoductlm Is goalHlirectedt making constant use 

of projections and anticipations of upcoming elements in the speech 
stxeam* 

To analyse a body of texts one must create a model of the semantic base 
underlying the texts and of the operations on that base^i^ch generate 
discourse. The analysis consists In the appllcatlm of the model to the 
body of texts* 

Vlthln this paradigm a theory of discourse Is really a theory of the 
inner structure and processes of the computational model. One Is concerned^ 
not Kith texts per se» but with the processes ^ i4ilch people create and 
understand texts, ^e imodel i4ilch I describe here has -been developed by 
David G* Hays and his students at SUNY Buffalo. 

OOGSITIVE JtErrwOHKS 

The basic model Is an associative cognitive netwoik. Imagine a spider* s 
web. The Junctions between threads are called nodes while the threads are 
arcs or links . Bach node Is a concept while the ares specify the relationships 
>dilch exists between the concepts at either ends of the arcs. Discourse Is 
produced by generating a path (or paths) through the netwoiic (imagine plannihg 
a trip using a roadmap) and It Is comprehended by assimilating a particular 
path into the network. 



3 

Both senantlos and syntax axe embedded vlthin a netwozk stxuoturei 
(In this model pxagpiatlos Is essentially hl^er order semantics* see Blocni* 
and Hay St in prepaxatlo^i.) The syntactic netmzfc operates on the semantic 
netmift. That Is* processes In the semantic network are controlled 'by 
the ayntactlc netm^dc* 

Semantics Is relational and» In a sensei spatial. The meaning of a 
given node Is specified V the place which It occupies in the entire netwodc. 
That place Is given 'by the arcs which Ijapinge on the node. Syntax Is 
temporal* placing one item after another in the speedi streaA. The joh of 
the syntactic network Is io mediate between the spatial relatlcmallty of 
the semantic network and the linear unfolding of; the speech chain in ^ddch 
only one tem of a complex relational nexus can be given at a time. 

THE SYSTEMIC NBTWOBK 

The basic inventory of concepts of the system Is given in the SYsteplc 
network » which Is organized into paradigmatic* syntaematlc» and componentlal 
structure. 

There-^^re two typesj^of paradigms: substantive and flmctlc»ial> Items 
are organized into substantive paradigms according to their sensory attributes. 
One such paradigm has plajat at Its root. Tree» grasSt herbt vine« and bush are 
all varieties of plant and oak* pine* maple* sycamore* palm» ginko* etc. 
are»* in tam» varieties of tree. (Such paradigms have been examined on 
a cross-cultural basis* see Berlin* Breedlove and Haven* 1973>) 

Functional paradigms organise Items according to their use. Foods ■ 
are those plants and animals which can be eaten. And foods can be classified 
according to their methods of prepa;xatlon* the ways in vhlch they are eaten* 
or their place in the menue. 



i 



4 • 

Synta^aatlc stxuctuxe gives the relationship between propertlesi entltlesi 
evmtsi and plans* Redneas» rcundnesSf smoothness ^ axe properties which 
participate In the entity apple * Vh^ It falls txm the tree the apple Is 
participating In the event UH * And idien It is throm at Johnny* s head 
the moving apple Is participating in a plan* hit Johnny on the head * 

Camponentlal structure relates parts to ^oles* A tree consists of 
truitki rootSi branches» and foliage* Your topical hlxd has a head* a neck^ 
a bodyi two Hings» tvo legs and a tail* The act of hitting a basehalX includes 
Hatching the hall^ sninglng the hat to the hall» the follov-throu^» and 
etching the hall sail over the fence* 

The relationship between any two concepts In the systemic netvoiic can 
be established through tracing the path between the two nodes* Imagine 
the netvo]^ as a flsheman*s net* The coxds are the arcs and the knots 
are the c<mcept8* Grab the two nodes In questicm and pull tl^tt you have 
found the shortest path between the nodes* The path between tree and 
ainJlesauce would ccmslst of three links: 1) a paradigmatic link between 
tree and aPPle 'treaj 2) a conpon^t link between a pple tree and apple i and 
3) a ^tagmatlc link between aPPle and nash (the process by which applesauce 
is created)* A paradigmatic link in a flmctlonal paradigm would link 
applesau ce to food * All of these links (and scnne others) would be traversed 
in producing or ccrapreftwidlng the sentences Applesauce Is a food created^>by 
mashing the ftailt of* the apple tree * 

EPISODIC STRUCIUBH! 

Events happen In particular places at particular times* For this va 
have €i)lsodlc' structure* It Is all well and good to talk of apple mashlngj' 
but what of that particularly fine ap£>l^ mashing the tfaXt<m*s had to celebrate 

5 



the pabXlcatlm of John Boy^s first short story? That ha]p!pened at a particular 

time and In a particular place and so that record Is kept In the episodic 

stores The episodic store Is thus the 8y8tea*s historical archive* 

Typical episodic structures fom the basis of fcames (Hlnsky 1975) or 

scripts and plans (Schank and Ahelscn 1977) * The creation of good a^lesauce 

Is actually a Aodexately ccnpllcated affalrt Involvins the coordination of 

events in several different places at several different times* First one 

must get the apples (fxm the orchard ot from the grocer); that hapfpens In 

one place* Then the apples must be moved to another place uhere they are 

washed* They are then moved (but perhaps not so far as the first time) to 

a place where they are peeled (an optional step) and cored (not so optional) * 

'-simmered 

After this they are (a slightly different place) and then mashed (yet 

another place)* And that* roughlyt Is how you make applesauce* It Is too 
complicated to be handled by basic ^stentlc structure* Bathert It Is a 
s^tlo*temporal organization of i^temlc s^cuetures* 

The same episodic frame which Is used to perfom some activity can also 
be used to produce and comprehend discourse about that activity* I use my 
a^lesaucennaking ftame to create my little story about the applesaudng of 
Jcto Boy and you use your a^lesauce*making frame to comprehend my story* 

as John Boy Is In the orchard picking the apples you are using the 
f^e to anticipate the next 3tep in the storyt and thm the step after that* 
You natch my story against your internalized applesauce-making frame Bni 
uhen Grandpa empties a quart of vodka into the saucepan your attention Is 
aroused - that certainly Is not in the applesauce-ftame* And so you must 
now embed your consequences-of-drlnk ftane in the applesauce f^anie* This 
causes you to anticipate that* at some point In the story* John Boy Is going 
to get drunk and do something he might regret* 



VeUt not qultei John Bay does get roarins druiiki And he reveals that 
he had plaeezlaed the story fxoa a frleod/bf bis* He*s been feeling guilty 
for veeksi but now that he*8 told the truth he feels better and he*s going 
to tell his friend ^t he did and make sure that the publichex^ lists the 
name of the real author^;* He oseIcs all to forgive him for idiat he*s done* 

Thus* **The Apple-Sousing of Jdtm Bqy*' Is a stocy with a laorali only In 
sine^ repantanee do the guilty find relief* This story involves abstract 
ecmeeptst gullti repentance* justice. The agent of Injustlc feels guilt 
and can find relief only in repentance. 

To understand this we have to consider the next level of the system. 

THE GH0(40NIC SYSTEM 

The gnoDionlc system defines abstract concepts over episodes. The story 
of John Boy Is a particular example of repentance* There axe many other 
such stories. Within this particular system all abstract concepts are 
defined over sets .of episodes containing exemplary stories (B^ason 1976$ 
1978, J Hays 1S7% I976t Phillips 1975i In pressj Vhlte 1975)* Xt Is 
particularly important to note that stories vhlch themselves define a certain 
abstract concept can contain abstract concepts. Thus one can tal}c of a 
first rank abstraction as one defined over stories containing no abstract 
conc^ts. A second xaiik abstraction Is defined over stories ifhlch contain 
at least one first rank abstratlon. By continuing this processi i^ch Is 
recursive > It Is possible to build up conc^ts of indefinitely high abstractive 
rank, Is s«ae reason to believe that cultural evolution proceeds In 

jist this way (Benson 1978)* 

* Let us consider another example. 

3) Haiy vent into ^e mods and saw some pretty mushrooms. She picked 
then and returned home inhere she ate them. Shortly thereafter she 



became violently lU. Flnallyt she died. 
. 4) Billy was playliig In the yard. A big hairy sgplder came up and 

bit him. Not too long afterwaxd he became violently iiXl bxA 

hjSLii/BB dbednselous for three days before he finally revived. 
Both of these stories involve poison . But we have tuoi^enses of poison. The 
physical substance I the mishroomt the spider's venom * Is poison Tyy functional 
definition. Just as sonethlng Is food ^ virtue of its capacity to be eateni 
so something Is a poison by virtue of Its capaclly to fill a certain role 
in stories such as 3) aiul 4) above. 

But we also have an abstract concept of poison ijbich Gaerges only 
through consideration of the vhole pattern. Abstractly considered^ 

5) Poison is an evil J^lrit ^ch causes a perscm*s soul to leave 

his body* tmporarlly or pen&anently. 

Abstract- poison is an Ineffable substance ijbich exiets in certain physical 

substances (namelyi those functionally defined as poisons) uhlch causes then 

to have certain effects. Statement 5) ^ ratlonallgation of abstract 

poison* it is an attempt to explain how polscms (fUnctlcmally defined) 

have their effect. Other elmeDts of that rationalisation must also be 

abstractions ( squI , and evil spirit (poison Is just a variety of evil spirit)). 

Thusi associated with every abstract concept we have the set of 

exemplary episodes iddch illustrate the concept uhlch provides the primary 

A 

definitional basis for the concepti and the rationalization, i^hich attempts 
to explicate the concept and which Ist as such, the secondary definitional 
basis of the concept. Notice that this account Is cons<»iant with Thomas 
Kuhn's notion of a paradigm (1970). The primary definitional basis of a 
Kuhnlan paradlgn consists of exemplary e)cperiments and problems. The 

8 



8 

explicit xules of science are secosidary to those examples. Those explicit ■ 
xules axttf In my texninolog/, latlonallzatl'^ns. 

DISCOURSE fiOHEREHCE 

Acooxdlng to Brian FhlUips (1975f in press) discourse cohermce must 
be considered on tm levelSt the episodic and the gnononlc* At the episodic 
level tonporal, causalt and si^tlal relationships must foxm a coherent pattern* 
One can* t have John Boy picking apples in Twentieth Century Europe and 
mashing them in Nineteenth Century Africa - at least not in the humble sort 
of stoxy I described* At the gnomonlc level a discourse must have a theme* 
that ISf It Eiu&t be an instance of some abstract concept* A discourse can' 
be coherent at the episodic level without having any significant gncmonlc 
structure * a^stralfiht historical chronical (and I do mean very straight) 
vould be such a discourse* And gncoonlc patterning may absorb appaactnt 
anomalies at the episodic level * a rather staid science fiction stoxy 
can use time travel to have John Boy mash the apples even before he*s been 
bom and a writer of contraporaxy metaflctlon might use a similar anomaly - 
for a different effect* 

A discourse whldi engenders episodic and/or gnommlc expectations * 
iJhlch are not then fulfilled Is incoherent* However, it is rarely the 
case that all the information needed to understand a discourse Is presant 
explicitly in the text* Huch must be inferred* Consequently it is possible 
that a discourse Mtiich is coherent for one person is incoherent for another* 
If one doesn't have the knowledge necess^xy to make the proper Inferenoes 
on the basis of the infozmatlon presented In a discourse, then the dls* 
course wHl appear to be incoherent without In fact being so* Coherence 
is a property of the relationship >>etween a given discourse and the semantic 



iMise Into which that dlscaarse Is being assimilated * 

This Is a falcLy xeLatlvlstlc notion of cc^ermce» but It IsnH quite 
equivalent to asserting that any aiscouxse Is coherent to soaeone* Consider 
the fairly frequent sltuatl^m lAiexe soaeone tdll make sme notes* Kclte 
a few paragraphs or so» and then come back to that dlsceurse a fevvhottrsf 
dayaf weekSf etc* later and find the discourse coupletely imintelllglble* 
Ke3re Is a case of a discourse being Incoherent In relationship to the semantic 
base.' from which It was produced* For that matter, much o£ the difficulty 
of writing coherent prose is precisely In the process of making discourse 
coherent to the author (l*e* to the swantlc base :ftom i^ch the discourse 
is generated) * Thus we are not left with the uninteresting notion that 
any dlsoourse is coherent to someone* Some discourses are so ill-formed 
th^t they make sense to no one* not even their creators* 

If that defines a lower limit to discourse coherence^ then perhaps we 
mi^t consider nhat an upper limit mi^t be like* It is no secret that 
literary critics have widely divergent views on the meaning and significance 
of literary texts* Hoxman Holland explains this by the ccmcept of identity 
theme (Holland 1975)* Different people have different identity themes (i.e* 
personalities) and so read texts differently^ each reads according to his 
own identity theme* Fresumably^ differences In identity theme could 
be translated into differences In seitantic bases* so I have no quarrel 
with Holland* But I want to make a different suggestion* 

Host of the texts studied by professional students of literature were 
written by people whose mental and creative powers are probably greater 
than those of their professional students* Thus was coherent to the 
artist mi^t be incoherent to the critic whose powers vls4i*vls the text 
are like those of the blind men vls-a-vls the elephant* This situation 

10 



10 

wuXd also lead to critical chaos* We accept this sort of principle i4ien 
w assign gradf^ levels to texts Intended for school children* Vt^ not 
apfly It to curselves; F^Aiaps we are 21&t graders reading 25th g3cade texts* 

The ui^er Ujnlts to (llscourse coherence are thus set ^ the most powerful 
creative minds* Between ,the upper and the lomr limits va have a cultural 
conmunltyt a group of individuals Whose various (llscourses are mutually 
coherent In vairylng degrees* A discourse iMch falls below the lower limit 
Is cohermt to no one* But the upper limit defines the degree to i4ilch 
apparratly cmfllcting discourses can become mutually cdierent through 
higher level patterning* 

CCKaUSION 

It Is probably the case thatt for the computational lingulstt discourse 
analysis requires nothing less than a flill«-%lown computational theory of the 
human mind* That Is a tall order* And we are not dose to filling It* 
Indeedf If the human mind does In fact posess the recursive abstractlcm 
power this model attributes to it* then the mind will always outstrip our 
efforts to model it (for It will be constructing the model)* But there Is 
much to be learned in attempting to create a computational theory of 
the human mini* And the tools with i4ilch to create that theory are 
available to those ^o would use them* The field of computational linguistics 
is immature and rich in promise* 



11 



11' 

References 

Benzont ViUlm L. ' Cognitive Ketifozks and Literaxy Saoantlcs." 91t 952 - 
962, 1976. 

Benzon, William Li Cognitive Science and Literary Theoryi Unpublished 
Doctoral Dissertation, SUKY at Buffalo, 1978. 

Benzonp Villlasi L. and David G. Hays. ComputatloQal Linguistics and the 

Humanist. Connniters and the Humanities lO: 265 - 27^i 1976- 
Berlin^ Bsrentp Dennis B. Breeklovep and Peter H. Raven. General Principles of 

Classification and llciitenblatuxe In Folic Biology. American Anthropologist 

?5: 2ik - 2it2p 1973. 
Bloomp Davldp and David G. Hays. ' Designation in Qi^ish. In press. 
Haysp David G. The Meaning of a Tem is a Function of the Theozy in Which 

It Occurs. SIGUSH Kewsletter 6p Ho. It: 8 - Up 1973- 
HaySp David G. On *^ Alienation** : An Essay in the Psychollnguistics of Science. 

In R. Felix Geyer abd David R. Schweitzerp eds. ThecaAes of Alienation . 

Martlnus Kijhoff, 1976* 169 - 18?. 
Hollandp Norman. Five Readers Reading . Hew Haven: Yale University Press, 1975 
Klntschp Wedter. The Representation of Meaning in Memory . Hillsdale: 

■ Lawrence Erlbaumi 197^. 
Kuhnp Thomas. The Structare of Scientific Revolutions . : licago: University 

of Chicago PresSp 1970. 
Klnskyi Marvin. A Framawozic for Representing Knowledge. In P.H. Vinstonp ed. 

The Paycliology of Computer Vision . Hew Yozfc: HcGraw-Hlllt 1975- 
Komianp Donald A.p and David E. R^Jiaelhard and the WR Research Group. 

Explorations in Cognition . San Francisco: J'reemanp 1975. 



Fhilllpst Brian* Topic Analysis* Unpublished Doctoral Dissertation^ 

SUMY Buffalo, 1975* 
Fhillipsi Brian* A Model for Knowledge and Its Application to Discourse 

Analysis* American Journal of Computational Linguistics ^ in press* 
Schanki Roger, and R* Abelsctti* Scripts Plans Goals and Understanding * 

Hillsdalet Lawrence Erlbaunii 1977* 
fniomdyket Perry* Pattern-Directed Processing of Knowledge frcm Texts* 

Rand Paper P-58p6, May 1977* 
Vhltet Hary J* Cognitive Networks and Worldvlem The Metaphysical Terminology 

of a Hillenarian Community* Unpuliilished Doctoral Bisseration^ SUNY BuffSalOt 

1975* 



